File Redundancy Issues in Distributed Database Systems

نویسندگان

  • Shojiro Nishio
  • Toshihide Ibaraki
  • Hidehiro Miyajima
  • Toshiharu Hasegawa
چکیده

This paper treats the file redundancy issue in distributed database systems asking what is the optimal number of file copies, given the ratio r of the frequencies of query and update requests. To draw a general conclusion applicable to a wide variety of practical distributed database systems, simplified network models are constructed , and optimal number of file copies, as well as their locations, to minimize the communication cost is ccmputed. By examining various network types, we plot the optimal number of file copies as a function of the ratio r. Our conclusion is that a single copy suffices under moderate condition, and it is disadvantageous to have more than a few copies unless the frequency of query requests is unduly higher (e.g., 50 times) than that of update requests. l.Introduction In many papers discussing wurrencv con-ti mechanisms and performance of the distributed database systems (abbreviated to DDBS), the full r-m of the data files is often assumed (i.e. , each site in the system has a complete copy of every database (file)). Although it is theoretically easy to treat the systems under the full redundancy assumption, this assumption should be carefully justified from the view point of the system performance. Our conclusion of this paper is quite contrary; a single copy of the file suffices in most of the practical cases and the full redundancy cannot be justified unless the frequency of query requests is extremely higher than that of update requests. The optimal file redundancy and allocation problem in ED53 has been discussed in a number of Papers (e.g., see a survey by Dowdy, et al. [II). However, most of these papers emphasize the precise formulation of given particular DDBS models. Although these are useful to determine the optimal file redundancy and allocation of a given particular system, the computed results may not be general enough to suggest the characteristics common to many practical DDEE models. To discuss the system performance in a general manner, the poerating costs of DDE?3 are classified into storm costs of files and QXIIUZ. . nlcatlon for sueries and U&&S. The high redundancy tends to decrease the communication costs for queries, but increase the storage costs and the communication costs for updates. The ratio r of the frequencies of update requests to query requests is a key parameter to determine the optimal redundancy of a given system. We further place the following symplifying …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions

We analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous bugs related to file-system fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from file-system faults: a single file-sy...

متن کامل

Exploiting the overlap between temporal redundancy and spatial redundancy in storage system

Recent years have seen ever increasing file systems and storage applications employ versioning or log-based techniques to protect data integrity and improve the overall system performance, in addition to traditional database systems. The basic idea behind these techniques is to add extra data redundancy before committing new updates on disks. More specifically, in database systems, the log reco...

متن کامل

Task allocation in Distributed computing VS distributed database systems : A Comparative study

Task allocation in Distributed computing systems (DCS) is an important research problem . When resource to be shared in DCS is a database that system is classified as Distributed database system(DDBS) . In DDBS systems Data & operation allocation are both closely interrelated & highly dependent on each other. Here it is represented along with model of allocation and development of such a model ...

متن کامل

CSAR-2: a Case Study of Parallel File System Dependability

Modern cluster file systems such as PVFS that stripe files across multiple nodes have shown to provide high aggregate I/O bandwidth but are prone to data loss since the failure of a single disk or server affects the whole file system. To address this problem a number of distributed data redundancy schemes have been proposed that represent different trade-offs between performance, storage effici...

متن کامل

Research Directions in Parallel I/O for Clusters

Parallel I/O remains a critical problem for cluster computing. A significant number of important applications need high performance parallel I/O and most cluster systems provide enough hardware to deliver the required performance. System software for achieving the desired goals remains in the research and development stage. A number of parallel file systems have achieved remarkable goals in one...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1983